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Abstract: 

This paper presents a survey on some recent advances for the type I error rate control in multiple testing method- 
ology. We consider the problem of controlling the /:-family-wise error rate (kFWER, probability to make k false 
discoveries or more) and the false discovery proportion (FDP, proportion of false discoveries among the discoveries). 
The FDP is controlled either via its expectation, which is the so-called false discovery rate (FDR), or via its upper-tail 
distribution function. We aim at deriving general and unified results together with concise and simple mathematical 
proofs. Furthermore, while this paper is mainly meant to be a survey paper, some new contributions for controlling 
■ the kFWER and the upper-tail distribution function of the FDP are provided. In particular, we derive a new procedure 

^ ' based on the quantiles of the binomial distribution that controls the FDP under independence. 
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1. Introduction 

The problem of testing several null hypotheses has a long history in the statistics literature. With 
the high-resolution techniques introduced in the recent years, it has known a renewed attention 
in many application fields where one aims to find significant features among several thousands 
(or milhons) of candidates. Classical examples are microarray analysis [58, 17, 19, 20], neuro- 
imaging analysis [4, 42] and source detection [40]. For illustration, we detail below the case of 
microarray data analysis. 

1.1. Multiple testing in microarray data 

In a typical microarray experiment, the level expressions of a set of genes are measured under 
two different experimental conditions and we aim at finding the genes that are differentially ex- 
pressed between the two conditions. For instance, when the genes come from tumor cells in the 
first experimental condition, while they come from healthy cells in the second, the differentially 
expressed genes may be involved in the development of this tumor and thus are genes of special 
interest. Several techniques exist to perform a statistical test for a single gene, e.g. based on a dis- 
tributional assumption or on permutations between the two group labels. However, the number of 
genes m can be large (for instance several thousands), so that non-differentially expressed genes 
can have a high score of significance by chance. In that context, applying the naive, non-corrected 
procedure (level a for each gene) is unsuitable because it is likely to select (or "discover") a lot 
of non-differentially expressed genes (usually called "false discoveries"). For instance, if the 
m = 10,000 genes are not differentially expressed (no signal) and a = 0.1, the non-corrected 
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procedure makes on average ma = 1,000 discoveries which are all false discoveries. In a more 
favorable situation where there are only mo = 5,000 non-differentially expressed genes among 
the m = 10,000 initial genes (50% of signal), the non-corrected procedure selects some genes, 
say r genes, for which the expected number of errors is mo a = 500. Since the number of discov- 
eries r is not designed to be much larger than the number of false discoveries mo a, the final fist 
of discovered genes is likely to contain an unacceptable part of errors. A multiple testing proce- 
dure aims at correcting a priori the level of the single tests in order to obtain a list of selected 
genes for which the "quantity" of false discoveries is below a nominal level a. The "quantity" of 
false discoveries is measured by using global type I error rates, as for instance the probabihty to 
make at least k errors among the discoveries (A:-family-wise error rate, A:-FWER) or the expected 
proportion of errors among the discoveries (false discovery rate, FDR). Finding procedures that 
control type 1 error rates is challenging and is what we called here the "multiple testing issue". 
Furthermore, a feature that increases the complexity of this issue is the presence of dependencies 
between the single tests. 

Note that the multiple testing issue can be met in microarray analysis under other forms, as 
for instance when we search co-expressed genes or genes associated with clinical covariates or 
outcomes, see Section 1.2 of [17]. 

1.2. Examples of multiple testing settings 

Example 1.1 (Two-sample multiple f-tests). The problem of finding differentially expressed 
genes in the above microarray example can be formalized as a particular case of a general two- 
sample multiple testing problem. Let us observe a couple of two independent samples 

x = {x\...,x") = {y\...,y"\z\...,z"^) eR'"''", 

where (7\...,7"i) is a family of ni i.i.d. copies of a random vector Y in M*" and (Z\...,Z"2) 
is a family of n2 i.i.d. copies of a random vector Z in W (with n\+n2 = n). In the context of 
microarray data, Y- (resp. Z/), I < i < m, corresponds to the expression level measure of the 
i-th gene for the j-th individual of the first (resp. second) experimental condition. Typically, the 
sample size is much smaller than the number of tests, that is, n <C m. Let the distribution P of 
the observation X belong to a statistical model given by a distribution set Assume that ^ 
is such that X is an integrable random vector and let = KYj and }ii^2{P) = EZ,-, for any 

/ G { 1 , . . . , m} . The aim is to decide for all / whether P belongs to the set ©o,,- = {P G =^ : /i;, i {P) = 
f^i,2{P)} or not, that is, we aim at testing the hypothesis 

Hoj : "Mu (P) = ^ii,2{Py against Hu : "Mu (P) / ^iaiP)", 

simultaneously for all / € {l,...,m}. Given P, the null hypothesis //q, (sometimes called the 
"null" for short) is said to be true (for P) if P G ©o,,, that is, if P satisfies //o,i- It is said false 
(for P) otherwise. The index set corresponding to true nulls is denoted by J^q{P) = {1 < ? < 
m : jXi^i (P) = iii2{P)}- Its complement in = {1, ...,m} is denoted by J^i (P). In the microar- 
ray context, Jti{P) = {I < i < m : IJ.i.\{P) liijiP)} is thus the index set corresponding to 
differentially expressed genes. The aim of a multiple testing procedure is thus to recover the (un- 
observable) set (P) given the observation X. A multiple testing procedure is commonly based 
on individual test statistics, by rejecting the null hypotheses with a "large" test statistic. Here, the 
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individual test statistic can be the (two-sided) two-sample t-statistic oc |7; — Z,|, rescaled 
by the so-called "pooled" standard deviation. To provide a uniform normalization for all tests, it 
is convenient to transform the 5, (X) into the p-value 

Pi{X) = sup Tp,iiSi{X)), (1) 

where Tpi{s) = Fxr^p{Si{X) > s) is the upper-tail distribution function of Si{X) for Z ~ P G ©o,,-. 
Classically, assuming that Yi and Z, are Gaussian variables with the same variance, we have for 
any P G ©o,/, Tp^i{s) = 2P(Z > s), where Z follows a Student distribution with n — 2 degrees of 
freedom. In that case, each p-value Pi{X) has the property to be uniformly distributed on (0, 1) 
when the corresponding null hypothesis //q,; is true. Without making this Gaussian assumption, 
values can still be built, as we discuss in Remark 1.3 below. Let us finally note that since the 
Tp^i are decreasing, a multiple testing procedure should reject nulls with a "small" ;7-value. 

Example 1.2 (One-sided testing on the mean of a Gaussian vector). To give a further illustrating 

example, we consider the very convenient mathematical framework for multiple testing where we 
observe a Gaussian vector Z = {Xi)\<i<m ~ P, having an unknown mean j^.{P) = {pii{P))\<i<m ^ 
R*" and a mxm covariance matrix Z(P) with diagonal entries equal to 1. Let us consider the 
problem of testing 

HQ,i : "liiiP) < 0" against i^i,,- : "jU;(P) > 0", 

simultaneously for all i G {1, ...,m}. We can define the /^-values pi = ^{Xj), where ^{x) = P(Z > 
x) for Z ~ ./K(0, 1). Any p-value satisfies the following stochastic domination under the null: if 
IJ-i{P) < 0, we have for all u G [0, 1], 

¥{pi{X) <u)< ¥{^{Xi - ^i{P)) <u) = u. 

Additionally, more or less restrictive assumptions on can be considered to model different 
types of dependency of the corresponding p-values. For instance, we can assume that has 
only non-negative entries, that the non-diagonal entries of are equal (equi-correlation) or 
that E(/') is diagonal. Finally, the value of the alternative means can be used for modeUng the 
"strength of the signal". For instance, to model that the sample size available for each test is n, 
we can set }ii{P) = t\/n for each }ii{P) > 0, where t > is some additional parameter. 

Remark 1.3 (General construction of /^-values). In broad generality, when testing the nulls ©o , 
by rejecting for "large" values of a test statistic Si{X), we can always define the associated p- 
values by using (1). It is well known that these /?- values are always stochastically lower-bounded 
by a uniform variable under the null, that is, V/ G J^^(P), Vm G [0, 1], P(/7,(Z) <u) <u. This 
property always holds, even when Si{X) has a discrete distribution. For completeness, we pro- 
vide this result with a proof in Appendix A. However, the calculation of the p-values (1) is not 
always possible, because it requires the knowledge of the distribution of the test statistics under 
the null, which often relies on strong distributional assumptions on the data. Fortunately, in some 
situations, the /^-values (1) can be approximated by using a randomization technique. The result- 
ing /(-values can be shown to enjoy the same stochastic dominance as above (see, e.g., [44] for 
a recent reference). For instance, in the two-sample testing problem, permutations of the group 
labels can be used, which corresponds to use permutation tests (the latter can be traced back to 
Fisher [25]). 
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1.3. General multiple testing setting 

In this section, we provide the abstract framework in which multiple testing theory can be inves- 
tigated in broad generality. 

Let us consider a statistical model, defined by a measurable space endowed with a 

subset ^ of distributions on X). LetX denote the observation of the model, with distribution 
P G Consider a family (0o,i)i<(<m of m > 2 subsets of Based on X, we aim at testing the 
null hypotheses Hq ^ : "P G &o.i" against the alternative Hi j : "P G @q" simultaneously for all 
/ G {1, ...,m}. For any P e ^, let J^oiP) = {l<i<m:Pe ©o,;} be the set of the indexes / for 
which P satisfies Hq^i, that is, the indexes corresponding to true null hypotheses. Its cardinality 
\J^{P)\ is denoted by mo{P). Similarly, the set {1, ...,m} is sometimes denoted by J^. The set 
of the false null hypotheses is denoted by J^i{P) = Jif\Jifo{P). The goal is to recover the set 
Jifi (P) based on X, that is, to find the null hypotheses that are true/false based on the knowledge 
of X. Obviously, the distribution P of X is unknown, and thus so is J^i {P). 

The standard multiple testing setting includes the knowledge of /j-values {pi{X))i<i<m satis- 
fying 

VP G G J^){P), Vm G [0, 1], F{pi{X) <u)<u. (2) 

As a consequence, for each / G {l,...,m}, rejecting Hqj whenever Pi{X) < a defines a test of 
level a. As we have discussed in the previous section, property (2) can be fulfilled in many 
situations. Also, in some cases, (2) holds with equality, that is, the Pi{X) are exactly distributed 
Uke a uniform variable in (0, 1) when //q,/ is true. 

1.4. Multiple testing procedures 

In the remainder of the paper, we use the observation X only through the p-value family p(X) = 
{pi{X), 1 < j < m}. Therefore, for short, we often drop the dependence in X in the notation and 
define all quantities as functions of p = {/?,, 1 < / < m} G [0, 1]™. However, one should keep 
in mind that the underlying distribution P (the distribution of interest on which the tests are 
performed) is the distribution of X and not the one of p. 

A multiple testing procedure is defined as a set- valued function 

R:q= {qi)l<i<m e [0,1]"" ^R{q) C {l,...,m}, 

taking as input an element of [0, l]*" and returning a subset of {1, ...,m}. For such a general pro- 
cedure R, we add the technical assumption that for each / G {1, the mapping x G <^ i-> 
1{/ G P(p(x))} is measurable. The indexes selected by P(p) correspond to the rejected null hy- 
potheses, that is, / G P(p) ^ "Hq i is rejected by the procedure P(p)". Thus, for each p-value 
family p, there are 2*" possible outcomes for /?(p). Nevertheless, according to the stochastic 
dominance property (2) of the /7-values, a natural rejection region for each Hq i is of the form 
Pi < ti, for some G [0, 1]. In this paper, we mainly focus on the case where the threshold is the 
same for all p-values. The corresponding procedures, called thresholding based procedures, are 
of the form /?(p) = {\ <i<m: pi< ?(p)}, where the threshold ?(•) G [0, 1] can depend on the 
data. 
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Example 1.4 (Bonferroni procedure). The Bonferroni procedure (of level oi G (0, 1)) rejects the 
hypotheses with a p- value smaller than ajm. Hence, with our notation, it corresponds to the 
procedure i?(p) = {\<i<m:pi< a/m}. 

1.5. Type I error rates 

To evaluate the quality of a multiple testing procedure, various error rates have been proposed 
in the literature. According to the Neyman-Pearson approach, type 1 error rates are of primary 
interest. These rates evaluate the importance of the null hypotheses wrongly rejected, that is, of 
the elements of the set /?(p) fl Nowadays, the most widely used type I error rates are the 

following. For a given procedure R, 

- the k-family-wise error rate (^-FWER) (see e.g. [32, 44, 36]) is defined as the probabihty 
that the procedure R makes at least k false rejections: for all P G 

A:-FWER(/?,P) = P(|/?(p)n J^(P)| >A:), (3) 

where A: G {1, ...,m} is a pre-specified parameter. In the particular case where A: = 1, this 

rate is simply called \hs family-wise error rate and is denoted by FWER(/?,P). 

- the false discovery proportion (FDP) (see e.g. [53, 5, 36]) is defined as the proportion of 
errors in the set of the rejected hypotheses: for all P G 

POP(.(p),P).«|^, 

where \R{p)\ V 1 denotes the maximum of \R{p)\ and 1. The role of the term "Vl" in the 
denominator is to prevent from dividing by zero when R makes no rejection. Since the FDP 
is a random variable, it does not define an error rate. However, the following error rates 
can be derived from the FDP. First, the y-upper-tail distribution of the FDP, defined as the 
probabihty that the FDP exceeds a given y, that is, for all P G 

P(FDP(/?(p),/')>r), (5) 

where /G (0,1) is a pre-specified parameter. Second, the false discovery rate (FDR) [5], 
defined as the expectation of the FDP: for all P G 



FDR(/?,P) = E[FDP(/?(p),P)] = E 



|j?(p)n^o(p) 

|/?(p)|Vl 



(6) 



Note that the probability in (5) is upper-bounded by a nominal level a G (0, 1) if and only if the 
(1 — a)-quantile of the FDP distribution is upper-bounded by y. For instance, if the probability 
in (5) is upper-bounded by a = 1/2, this means that the median of the FDP is upper-bounded 
by J. With some abuse, bounding the probabihty in (5) is called "controlling the FDP" from now 
on. 

The choice of the type 1 error rate depends on the context. When controlling the A-FWER, 
we tolerate a fixed number (A — 1) of erroneous rejections. By contrast, a procedure controlling 
(5) tolerates a small proportion y of errors among the final rejections (from an intuitive point 
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of view, it chooses k ~ Y\R\)- This allows to increase the number of erroneous rejections as the 
number of rejections becomes large. Next, controlling the FDR has become popular because it 
is a simple error rate based on the FDP and because it came together with the simple Benjamini- 
Hochberg FDR controlhng procedure [5] (some dependency structure assumptions are required, 
see Section 3). As a counterpart, controlling the FDR does not prevent the FDP from having 
large variations, so that any FDR control does not necessarily have a clear interpretation in terms 
of the FDP (see the related discussion in Section 6.2). 

Example 1.4 (Continued). The Bonferroni procedure /?(p) = {I < i < m : pi < a/m} satisfies 
the following: 

E|/?(p) n J^(P)| = 52 ^(Pi ^ ^ OCmQ{P)/m < a, 

which means that its expected number of false discoveries is below a. Using Markov's inequahty, 
this implies that /?(p) makes no false discovery with probability at least 1 — a, that is, for any 
P G FWER(/?,/') < a. This is the most classical example of type 1 error rate control. 

Remark 1.5 (Case where ,M{P) = For a distribution P satisfying ^)(P) = Jf', that is 
when all null hypotheses are true, the FDP reduces to FDP(/?(p),/') = l{|/?(p)| > 0} and we 
have FWER(/?,P) = FDR(/?,P) = P(FDP(/?(p),P) > y) =P(|/?(p)| > 0). Controlling the FWER 
(or equivalently the FDR) in this situation is sometimes called a "weak" FWER control. 

Remark 1.6 (Case where all null hypotheses are equal: p-value aggregation). The general frame- 
work described in Section 1.3 includes the case where all null hypotheses are identical, that is, 
= ©0 for all / G {l,...,m}. In this situation, all p-values test the same null Hq : "P G ©o" 
against some alternatives contained in 0q. For instance, in the model selection framework of 
[3, 18, 60], each /j-value is built with respect to a specific model contained in the alternative 0q. 
Since we have in that case ■^q{P) = if P G ©o and c^)(P) = otherwise, the three quantities 
FWER(P,P), FDR(P,P) and P(FDP(P(p),P) > 7) are equal and take the value P(|P(p)| > 0) 
when P G ©0 and otherwise. As a consequence, in the case where all null hypotheses are equal, 
controlhng the FWER, the FDR or the FDP at level a is equivalent to the problem of combining 
p-values to build a single testing for Hq which is of level a. In particular, from a procedure R that 
controls the FWER at level a we can derive a single testing procedure of level a by rejecting 
Hq whenever P(p) is not empty (that is, whenever P(p) rejects at least one hypothesis). This 
provides a way to aggregate /?- values into one (single) test for Hq which is ensured to be of level 
a. As an illustration, the FWER controlling Bonferroni procedure R = {I < i < m : pi < a/m} 
corresponds to the single test rejecting Ho whenever mini<;<m{p(} <a/m. The Bonferroni com- 
bination of individual tests is well known and extensively used for adaptive testing (see, e.g., 
[54, 3, 60]). Some other examples of ;7-value aggregations will be presented further on, see Re- 
mark 3.9. 

1.6. Goal 

Let a G (0, 1) be a pre-specified nominal level (to be fixed once and for all throughout the 
paper). The goal is to control the type I error rates defined above at level a, for a large subset of 
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distributions C . That is, by taking one of the above error rate S'{R,P), we aim at finding 
a procedure R such that 

VP G (^{R,P) < a, (7) 

for C as large as possible. Obviously, R should depend on a but we omit this in the no- 
tation for short. Similarly to the single testing case, taking /? = will always ensure (7) with 
= This means that the type I error rate control is inseparable from the problem of maxi- 
mizing the power. The probably most natural way to extend the notion of power from the single 
testing to the multiple testing setting is to consider the expected number of correct rejections, that 
is, E|J^(P) n/?|. Throughout the paper, we often encounter the case where two procedures R 
and /?' satisfy R' cR (almost surely) while they both ensure the control (7). Then, the procedure 
R is said less conservative than R' . Obviously, this implies that R is more powerful than R' . This 
can be the case when, e.g., R and R' are thresholding-based procedures using respective thresh- 
olds t and t' satisfying t > t' (ahnost surely). As a consequence, our goal is to find a procedure R 
satisfying (7) with a rejection set as large as possible. 

Finally, let us emphasize that, in this paper, we aim at controlUng (7) for any fixed m>2 and 
not only when m tends to infinity. That is, the setting is non-asymptotic in the parameter m. 

1.7. Overview of the paper 

The remainder of the paper is organized as follows: in Section 2, we present some general tools 
and concepts that are useful throughout the paper. Section 3, 4 and 5 present FDR, ^-FWER and 
FDP controlling methodology, respectively, where we try to give a large overview of classical 
methods in the literature. Besides, the paper is meant to have a scholarly form, accessible to a 
possibly non-specialist reader. In particular, all results are given together with a proof, which we 
aim to be as short and meaningful as possible. 

Furthermore, while this paper is mostly intended to be a review paper, some new contributions 
with respect to the existing multiple testing literature are given in Section 4 and 5, by extending 
the results of [30] for the A:-FWER control and the results of [45] for the FDP conttol, respec- 
tively. 

1.8. Quantile-binomial procedure 

In section 5, we introduce a novel procedure, called the quantile-binomial procedure that controls 
the FDP under independence of the /j-values. This procedure can be defined as follows; 

Algorithm 1.7 (Quantile-binomial procedure). Let for any t G [0, 1] and for any I G {1, ...,m}, 

qe(t)= the{l-a)-quantileof^{m-i+[Y{i-l)\+l,t), (8) 

where ^i-,-) denotes the binomial distribution and [/(Z* — 1)J denotes the largest integer n such 
that n < y(£ — !)• Let p^j) < •■• < P(^m) order statistics of the p-values. Then apply the 

following recursion: 

• Step 1; ifqi{p[i)) > y, stop and reject no hypothesis. Otherwise, go to step 2; 
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• Step I G {2, ...,m}; if qe{P{e)) > 7^- ^top and reject the hypotheses corresponding to 
. . Otherwise, go to step 1; 

• Step i = m+l, stop and reject all hypotheses. 

Equivalently, the above procedure can be defined as rejecting //q,/ whenever 

max {qi{p(e))/^} < 7- 

P{e)<Pi 

The rationale behind this algorithm is that at step £, when rejecting the i null hypotheses cor- 
responding to the p- values smaller than the number of false discoveries behaves as if it 
was stochastically dominated by a binomial variable of parameter {m — £+ [y{£ — 1)J + 1 ,/>(^) )• 
Hence, by controlling the (1 — o;)-quantile of the latter binomial variable at level yi, the (1 — a)- 
quantile of the FDP should be controlled by y. The rigorous proof of the corresponding FDP 
control is given in Section 5, see Corollary 5.4. Finally, when controlling the median of the FDP, 
this procedure is related to the recent adaptive procedure of [26], as discussed in Section 6.3. 

2. Key concepts and tools 
2.1. Model assumptions 

Throughout this paper, we will consider several models. Each model corresponds to a specific 
assumption on the value family p = {pi,l <i< m} distribution. The first model, called the 
"independent model" is defined as follows: 

ieM{P) ^ family of mutually independent 
variables and is independent of (;?,(^))/e.^i (p) } • (9) 

The second model uses a particular notion of positive dependence between the values, called 
"weak positive regression dependency" (in short, "weak PRDS"), which is a sUghtly weaker 
version of the PRDS assumption of [8]. To introduce the weak PRDS property, let us define a 
subset D C [0, 1]'" as nondecreasing if for all q,q' G [0, l]"* such that V/ G {1, ...,m}, qi < ((i, we 
have q' when q £ D. 

Definition 2.1 (Weak PRDS p-value family). The family p is said to be weak PRDS on J^q{P) 
if for any io G '^{P) and for any measurable nondecreasing set D C [0, 1]*" , the function u i-> 
P(p G D I Pig < u) is nondecreasing on the set {u G [0, 1] : ^{pi^ < «) > 0}. 

The only difference between the weak PRDS assumption and the "regular" PRDS assumption 
defined in [8] is that the latter assumes "m P(p G D \ pi^ = u) nondecreasing", instead of 
"m ^ P(p G D I Pig < u) nondecreasing". Weak PRDS is a weaker assumption, as shown for 
instance in the proof of Proposition 3.6 in [12]. We can now define the second model, where the 

values have weak PRDS dependency: 

^^''•' = {PG ^:p(X)isweakPRDSon (10) 

It is not difficult to see that C ^p"* because when P G pi^ is independent of {pi)i^if, 
for any jq G Mo{P). Furthermore, we refer to the general case of P G (without any additional 
restriction) as the "arbitrary dependence case". 
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As an illustration, in the one-sided Gaussian testing framework of Example 1 .2, the PRDS 
assumption (regular and thus also weak) is satisfied as soon as the covariance matrix I^{P) has 
nonnegative entries, as shown in [8] (note that this is not true anymore for two-sided tests, as 
proved in the latter reference). 

2.2. Dirac configurations 

If we want to check whether a procedure satisfies a type 1 error rate control (7), particularly 
simple p-value distributions (or "configurations") are as follows: 

- "Dirac configurations": the p-values of J^\{P) are equal to zero (without any assumption 
on the /)- values of ,^q{P)); 

- "Dirac-uniform configuration" (see [24]): the Dirac configuration for which the variables 
{Pi)ieM{P) are ii.d. uniform. 

These configurations can be seen as the asymptotic p- value family distribution where the sample 
size available to perform each test tends to infinity, while the number m of tests is kept fixed 
(see the examples of Section 1.2). This situation does not fall into the classical multiple testing 
framework where the number of tests is much larger than the sample size. Besides, there is no 
multiple testing problem in these configurations because the true nulls are perfectly separated 
from the false null (almost surely). However, these special configurations are still interesting, 
because they sometimes have the property to be the distributions for which the type 1 error rate 
is the largest. In that case, they are called the "least favorable configurations" (see [24]). This 
generally requires that the multiple testing procedure and the error rate under consideration have 
special monotonic properties (see [23, 48]). In this case, proving the type I error rate control for 
the Dirac configurations is sufficient to state (7) and thus appears to be very useful. 

2.3. Algorithms 

To derive (7), a generic method that emerged from the multiple testing literature is as follows: 

1. start with a family {Rk)k of procedures depending on an external parameter K; 

2. find a set of values of fc for which satisfies (7); 

3. take among these values the K that makes the "largest". 

The latter is designed to maintain the control of the type I error rate while maximizing the rejec- 
tion set. As we will see in Section 3 (JC is a threshold t), Section 4 (jc is a subset of ) and 
Section 5 (JC is a rejection number £), this gives rise to the so-called "step-up" and "step-down" 
algorithms, which are very classical instances of type I error rate controlUng procedures. 

2.4. Adaptive control 

A way to increase the power of type I error rate controlling procedures is to learn (from the 
data) part of the unknown distribution P in order to make more rejections. This approach is 
called "adaptive type I error rate control". Since the resulting procedure uses the data twice, the 
main challenge is often to show that it maintains the type I error control (7). In this paper, we will 
discuss adaptivity with respect to the parameter mo (P) = |J^(P)| fortheFDRin Section 3.3. The 
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procedures presented in Section 4 (resp. Section 5) for controlling the A:-FWER (resp. FDP) will 
be also adaptive to mo{P), but in a maybe more implicit way. Some of them will be additionally 
adaptive with respect to the dependency structure between the /^-values. Let us finally note that 
some other work studied the adaptivity to the alternative distributions of the p-values (see [62, 
49, 47]). 



3. FDR control 



After the seminal work of Benjamini and Hochberg [5], many studies have investigated the FDR 
controlling issue. We provide in this section a survey of some of these approaches. 



3.1. Thresholding based procedures 



Let us start from thresholding type multiple-testing procedures 

Rt = {^ <i<m: pi<t{p)}, 

with a threshold ?(■) G [0, 1] possibly depending on the p-values. We want to find t such that the 
corresponding multiple testing procedure Rf controls the FDR at level a under the model £pP"\ 
by following the general method explained in Section 2.3. We start with the following simple 
decomposition of the false discovery rate of Rf. 



FDR{Rt,P) = am~ 



aG(p,/(p)) V(a/m) 



(11) 



where G(p,m) = m~^YA=\^{Pi — "} denotes the empirical c.d.f. of the /j-value family p = 
{Pi, I <i <m} taken at a threshold u G [0, 1]. 

In order to upper-bound the expectation in the RHS of (1 1), let us consider the following infor- 
mal reasoning: if ? and G were deterministic, this expectation would be smaller than f/ (a G(p,f)) 
and thus smaller than 1 by taking a threshold t such that t < a G{p,t). This motivates the intro- 
duction of the following set of thresholds: 



^(p) = {ue [0, 1] : G(p, u) > u/a}. 



(12) 



With different notation, the latter was introduced in [12, 23]. Here, any threshold t G ^(p) is said 
"self-consistent" because it corresponds to a procedure = {1 <?< m :;?,•< ?} which is "self- 
consistent" according to the definition given in [12], that is, Rt C {I < i < m : pi < a\Rt\/m}. 
It is important to note that the set =^(p) only depends on the value family (and on a) so that 
self-consistent thresholds can be easily chosen in practice. As an illustration, we depict the set 
=^(p) in Figure 1 for a particular realization of the value family. 

Now, let us choose a self-consistent threshold ?(p) G =^(p). By using the decomposition (11), 
we obtain the following upper-bound: 



FDR{Rt,P) < am 



-1 



l{pi<t{p)} 
t{p)\/ (a/m) 



< am 



-1 



I IE 



l{pi<t{p)} 
tip) 



(13) 
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0.0 0.2 



Figure 1. The p-value e.c.d.fG{p, u) and u/a are plotted as functions ofu G [0, 1]. The points u belonging to the set 
^(p) lie on the X-axis of the gray area, m = 10; a = 0.5. 



with the convention g = 0. Since by (2),a we have pi{x) > for P-ahnost every x when i G J^(P), 
the denominator inside the expectation of the RHS of (13)a can only be zero when the numerator 
is also zero and therefore when the ratio is zero. Next, the following purely probabihstic lemma 
holds (see a proof in Appendix A of [12] for instance): 

Lemma 3.1. Let U be a nonnegative random variable which is stochastically lower bounded by 
a uniform distribution, i.e., F(U <u)< u for any u G [0, 1]. Then the following inequality holds: 



E 



1{U<V} 



V 



<1, 



(14) 



for any nonnegative random variable V satisfying either of the two following conditions: 

(i) V = g{U) where g : M"*" R"*" is non-increasing, 

(ii) the conditional distribution ofV conditionally onU <u is stochastically decreasing in u, 
that is, Vv > 0, M F(V <v\U <u) is nondecreasing on {u G [0, 1] : F{U <u) > 0}. 

A consequence of the previous lemma in combination with (13) is that the FDR is controlled at 
level amo{P)/m as soon as V = f (p) satisfies (ii) with U = pi. For the latter to be true, we should 
make the distributional assumption P G ^p°^ and add the assumption that the threshold ?(•) is 
non-increasing with respect to each ;7-value, that is, for all q,q' G [0, 1]*", we have t{q) < t{q') as 
soon as for all I <i<m,q'i< qt. By using the latter, we easily check that the set 

D = {qe[0,ir:t{q)<v} 

is a nondecreasing measurable set of [0, 1]*", for any v > 0. Thus, the weak PRDS condition 
defined in Section 2.1 provides (ii) with U = pi and V = t{p) and thus also (14). Summing up, 
we obtained the following result, which appeared in [12]: 

Theorem 3.2. Consider a thresholding type multiple testing procedure Rf based on a threshold 
?(•) satisfying the two following conditions: 
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- ?(•) is self-consistent, i.e., such that for all q G [0, 1]*", t{q) G ^{q) (where is defined 
by ill)) 

- t{-) is coordinate-wise non-increasing, i.e., satisfying that for all q,q' G [0, l]"* with < qi 
for all I <i <m, we have t{q) < t{q'). 

Then, for any P G S^P"', FDR{Rt,P) < amo{P)/m < a. 

Remark 3.3. If we want to state the FDR control of Theorem 3.2 only for P G without using 
the PRDS property, we can use Lemma 3.1 (i) conditionally on p_,- = {pjj 7^ i) G [0, 1]'""^ by 
taking V = t{U,p-i) and U = pi, because pi is independent of p_, when P G . 

3.2. Linear step-up procedures 

From Theorem 3.2, under the weak PRDS assumption on the p- value dependence structure, any 
algorithm giving as output a self-consistent and non-increasing threshold t{-) leads to a correct 
FDR control. As explained in Section 1.6 and Section 2.3, for the same FDR control we want 
to get a procedure with a rejection set as large as possible. Hence, it is natural to choose the 
following threshold: 

f"(p)=max{^(p)} (15) 

= maxjw G {ak/m,0 <k<m}: G{p,u) > u/a} 

= a/m X max{0 <k<m: pi^i^^ < ak/m}, (16) 

where < ... < /?(^) (/j^q) = 0) denote the order statistics of the p-value family. This choice 
was made in [5] and is usually called linear step-up or "Benjamini-Hochberg" thresholding. One 
should notice that the maximum in (15) exists because the set ^(p) contains 0, is upper-bounded 
by 1 and because the e.c.d.f. is a non-decreasing function (the right-continuity is not needed). It 
is also easy to check that the maximum u = ?™(p) satisfies the equality G{p,u) = u/a, so that 
f™(p) can be seen as the largest crossing point between between u i-)- G{p,u) and u^-^ u/a, see 
the left-side of Figure 2. The latter equaUty also impUes that t^"{p) G {ak/m,0 <k< m}, which, 
combined with the so-called switching relation 

mG{p,ak/m) > k p(^£^ < ak/m, 

gives rise to the second formulation (16). The latter is illustrated in the right-side of Figure 2. 
The formulation (16) corresponds to the original expression of [5] while (15) is to be found for 
instance in [27]. Moreover, it is worth noticing that the procedure Rf^ using the thresholding 
?'™(p) is also equal to {1 < / < m : pi < f™(p) V a/m}, so that it can be interpreted as an in- 
termediate thresholding between the non-corrected procedure using t = a and the Bonferroni 
procedure using t = a/m. 

Clearly, ?■'"(•) is coordinate- wise non-increasing and self-consistent. Therefore, Theorem 3.2 
shows that for any P G ^p"-\ FDR{Rt^u,P) < amo{P) /m. As a matter of fact, as soon as (2) holds 
with an equality, we can prove that for any P G S^' , the equality FDR(i?fs« ,P) = amo{P)/m holds, 
by using a surprisingly direct argument. Let po,-; denote the p-value family where pt has been 
replaced by 0, and observe that the following statements are equivalent, for any realization of the 

values: 
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Figure 2. The two dual pictorial representations of the Benjamini-Hochherg linear step-up procedure. Left: c.d.f. 
of the p-values, the solid line has for slope a^^. Right: ordered p-vatues, the solid line has for slope a/m. In both 
pictures, the filled points represent p-values that corresponds to the rejected hypotheses, m = 10; a = 0.5. 



(i) Pi < ^™(po,-0 

(ii) G(po,-,-,r™(p„,-;)) <G(p,r™(p(),-0) 

(iii) ?™(po,-,-)/«<G(p,r™(po,-,)) 

(iv) ?™(po,-0<?™(p). 

The equivalence between (i) and (ii) is straightforward from the defintion of G(-, •). The equiva- 
lence between (ii) and (iii) follows from G(po,-i,r™(po.-!)) = t"'(po.-i)/a, because t = t'™{po-i) 
is a crossing point between G(po,-;,/) and t/a. The equivalence between (iii) and (iv) comes 
from the definition of ?''"(p) together with ?™(po < f ™(p) <s=^> f ™(po,-i) = f™(p), the latter 
coming from the non-increasing property of ?™( ). As a consequence, 



(17) 



te-<r™(po,-0} = {p,<r™(p)}, 

with f™(po,-,) = r™(p) on these events. Therefore, using (17) and the first decomposition (11) 
of the FDR, we derive the following equahties: 

FDR{R,su,P) = am-' y J^HPi^t^^Ml 

ieMiP) L«G(p,?™(p))V(a/m)J 

' l{A<f^"(p)} ' 
?™(p) 

' l{A-<r"(Po.-0} 



ieJeoiP) 



= amo{P)/m, 



f"{po,-ir'HHPi<f"{po,-i)}\po,-i) 
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where we assumed in the last equality both that P G ^ and condition (2) holds with equality. 
To sum up, we have proved in this section the following result. 

Theorem 3.4. Consider the linear step-up procedure /J^™ using the threshold defined in (15). 
Then, for any P £ ^P°\ FDR{Rtsu , P) < amo {P) /m. Moreover, the latter is an equality ifP G 

and (2) holds with equality. 

This theorem is due to [5, 8]. The short proof mentioned above has been independently given 
in [22, 47, 23]. Theorem 3.4 proves that the inequahty "VP G ^p"', FDR(/?«»,P) < a" is sharp 
as soon as (2) holds with equality and there exists P G such that Jifo{P) = that is. 

Other instances of self-consistent procedures include linear "step-up-down" procedures as 
defined in [50]. Theorem 3.2 establishes that the FDR control also holds for these procedures, as 
proved in [12, 23]. 

3.3. Adaptive linear step-up procedures 

In this section we denote by 7lo{P) the proportion mo{P)/m of hypotheses that are true for P. 
Since we aim at controlhng the FDR at level a and not at level ano{P), Theorem 3.4 shows that 
there is a potential power loss when using t^" when the proportion 7to{P) is small. A first idea is 
to use the linear step-up procedure at level a* = min(a/;!<)(P), 1), that is, corresponding to the 
threshold 

t*{p)=max{ue [0,1] :G{p,u)>u/a*} (18) 
= max{MG [0,1] :G{p,u) >u7to{P)/a}. (19) 

Note that (18) and (19) are equal because when a > no{P), the maximum is 1 in the two formulas. 
From Theorem 3.4, threshold (19) provides a FDR smaller than a*;ib(P) < a for P G ^p"' and 
a FDR equal to a when P G (2) holds with equality and a < no{P). Unfortunately, since P 
is unknown, so is 7ro(P) and thus the threshold (19) is an unobservable "oracle" threshold. 

An interesting challenge is to estimate no{P) within (19) while still rigorously controlling the 
FDR at level a, despite the additional fluctuations added by the ;ib(^)-estimation. This problem, 
called 7ro(P) -adaptive FDR control, has received a growing attention in the last decade, see e.g. 
[6, 56, 9, 28, 7, 41, 51, 13]. To investigate this issue, a natural idea is to consider a modified 
linear step-procedure using the threshold 

?f (p) = max (m G [0, 1] : G{p,u) > u/{af{p))}. (20) 

where /(p) > is an estimator of (7ro(P))~^ to be chosen. The latter is called adaptive linear 
step-up procedure. It is sometimes additionally said "plug in", because (20) corresponds to (19) 
in which we have "plugged" an estimator of (;io(^))~^- Other types of adaptive procedures can 
be defined, see Remark 3.6 below. 

We describe now a way to choose / so that the control FDR(P,i" ,P) < a still holds. However, 
we only focus on the case where the /j-values are independent, that is, P G This restriction 
is usual in studies providing an adaptive FDR control. First, to keep the non-increasing property 
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of the threshold ?™( ), we assume that /(•) is coordinate-wise non-increasing. Second, using 
techniques similar to those of Section 3.2, we can write for any P G 



< anr^ £ E 

iemP) 



?7(p) 
i{Pi<t}\p)} 



f}"(p) 



/(p) 

/(Po,-/) 



/(Po,-OE 



= am"^ £ E 

!ejro(p) - 

<am-' £ E[/(po,_0 



HPi < ?7(p)} 



f7(p) 



Po, 



(21) 



where we used Lemma 14 (i) in the last inequality (conditionally on the p-values of {pj,j / /), 
because / is coordinate-wise non-increasing). Additionally assuming that /(•) is permutation 
invariant, we can upper-bound the RHS of (21) by using the Dirac-uniform configuration because 
/(•) is non-increasing. This gives rise to the following result. 

Theorem 3.5. Consider the adaptive linear step-up procedure Rpj" with a threshold defined in 
(20) using a {%q{P))^^ -estimator f satisfying the following properties: 

- /(•) is coordinate-wise non-increasing, that is, for all q,q' £ [0, 1]'" with for all I <i <m, 
q'i < qi, we have f{q) < f{q'); 

- /(•) is permutation invariant, that is, for any permutation c o/{l,...,m}, G [0,1]'", 

f{q\ ,---,qm)= /(?C7(1)) •■•)?a(m))' 

- / satisfies 

VmoG{l,...,m}, Ep^oj/(^g_i,„)(/(p)) <m/mo, (22) 

where DU{k,m) denotes the Dirac-uniform distribution on [0, 1]'" for which the k first coor- 
dinates are i.i.d. uniform on (0, 1) and the remaining coordinates are equal to 0. 
Then, for any P G 3^', FDR{Rfu,P) < a. 

The method leading to the upper-bound (21) was investigated in [7] and described latter in 
detail in [13]. The simpler result presented in Theorem 3.5 appeared in [13]. It uses the Dirac- 
uniform configuration as a least favorable configuration for the FDR. This kind of reasoning has 
been also used in [23]. 

Let us now consider the problem of finding a "correct" estimator / of (;ro(^))^- This issue 
has an interest in its own right and many studies investigated it since the first attempt in [52] 
(see for instance the references in [14]). Here, we only deal with this problem from the FDR 
control point of view, by providing two families of estimators that satisfy the assumptions of 
Theorem 3.5. First, define the "Storey-type" estimators, which are of the form 



/1(P) 



m(l-A) 



for A G (0, 1) (A not depending on p). It is clearly non-increasing and permutation invariant. 
Moreover, we can check that /i satisfies (22): for any mo G {1, ...,m}, considering {Ui)i<i<m(s-i 
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i.i.d. uniform on (0, 1), 



m 



Ep^Dt/(mo-l,m)(/l(p)) = — E 

^ ' mo 



mo(l - A) 



Li:r=r'i{t^'>^}+i 



< 



m 



mo 



because for any k>2,q^ (Oj 1) and for Y having a binomial distribution with parameters {k — 
l,q), we have E((l + Y)~^) < {qk)~^, as stated e.g. in [7]. This type of estimator has been 
introduced in [55] and proved to lead to a correct FDR control in [56, 7]. 

The second family of estimators satisfying the assumptions of Theorem 3.5 is the "quantile- 
type" family, defined by 

mi^-P{ko)) 



/2(P) 



m - 



^0 + 1 ' 

for A:o G { 1 , . . . , m} (^0 not depending on p). The latter may be seen as Storey-type estimators using 
a data-dependent A = P(kf^)- Clearly, /2( ) is non-increasing and permutation-invariant. Addition- 
ally, /2(-) enjoys (22) because for any mo G {l,...,m}, considering {Ui)\<i<mQ-i i.i.d. uniform 
on (0, 1) ordered as U^y-^ < ... < U(^i„^_iy 



'^p~D[/(mo-l,m)(/2(p)) = ^ 



fni^-U^ko-m+m-i)) 



m(l-E[f/(^_^+^g_i)]) 



m — A:o + 1 
m(l — (A;o -wi + mo- l)+/mo) 



m — A:o + 1 



m — ko+l 



^ m 
~ mo' 



by using the convention J7q) = when j < 0. These quantile type estimators have been proved 
to lead to a correct FDR control in [7]. The simple proof above was given in [13]. 

Which choice should we make for A or A:o? Using extensive simulations (including other type 
of adaptive procedures), it was recommended in [13] to choose as estimator /i with A close 
to a, because the corresponding procedure shows a "good" power under independence while it 
maintains a correct FDR control under positive dependencies (in the equi-correlated Gaussian 
one-sided model described in Example 1.2). Obviously, a "dynamic" choice of A (i.e., using 
the data) can increase the accuracy of the {7to{P))~^ estimation and thus should lead to a better 
procedure. However, proving that the corresponding FDR control remains valid in this case is an 
open issue to our knowledge. Also, outside the case of the particular equi-correlated Gaussian 
dependence structure, very little is known about adaptive FDR control. 

Remark 3.6. Some authors have proposed adaptive procedures that are not of the "plug-in" 
form (20). For instance, we can define the class of "one-stage step-up adaptive procedures", for 
which the threshold takes the form f°'*(p) = maxjw G [0,1] : G{p,u) > ra{u)], where ra{-) is 
a non-decreasing function that depends neither on p nor on 7io{P), see, e.g., [41, 23, 13]. As 
an illustration, Blanchard and Roquain (2009) have introduced the curve defined by ra(f) = 
{l+m~^)t/{t + a{l — a))ift<a and ra{t) = +°o otherwise, see [13]. They have proved that 
the corresponding step-up procedure controls the FDR at level a in the independent model 
(by using the property of Lemma 14 (i)). Furthermore, Finner et al. (2009) have introduced the 
"asymptotically optimal rejection curve" (AORC) defined by ra{t) = //(a +/(1 — a)), see [23]. 
By contrast with the framework of the present paper, they considered the FDR control only in 
an asymptotic manner where the number m of hypotheses tends to infinity. They have proved 
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that the AORC enjoys the following (asymptotic) optimality property: while several adaptive 
procedures based on the AORC provide a vaUd asymptotic FDR control (under independence), 
the AORC maximizes the asymptotic power among broad classes of adaptive procedures that 
asymptotically control the FDR, see Theorem 5.1, 5.3 and 5.5 in [23]. 

3.4. Case of arbitrary dependencies 

Many corrections of the linear step-up procedure are available to maintain the FDR control when 
the p-value family has arbitrary and unknown dependencies. We describe here the so-called 
"Occam's hammer" approach presented in [11]. Surprisingly, it allows to recover and extend 
the well-known "Benjamini-YekutieU" correction [8] by only using Fubini's theorem. Let us 
consider 

?^™(p) =max{M G [0, 1] : G{p,p{u)) > u/a} (23) 
= max{M G {ak/m,l <k<m}: G{p,P{u)) > u/a} 
= a/m X max{0 <k<m: p^^) < P{ak/m)}, (24) 

for a non-decreasing function j3 : — > M+. Then the FDR of Rp(^ipsu^ can be written as follows: 
for aayPe^, 

FDRf/? p)-«^-i r JHPiMitlMl 

r 

= am-i £ E l{pi<P{tP'"{p))} u-H{tl^"'{p)<u}di 
Next, using Fubini's theorem, we obtain 



p-\-oo 

ieMiP)^ 

/.+00 

<am-^ £ / u-^¥{pi<p{u))du 

iaMiP) 

= a'^^ r°°u-^l3{u)du. (25) 
m Jo 



Therefore, choosing any non-decreasing function J3 such that Jq^°°u ^P{u)du = 1 provides a 
valid FDR control. This leads to the following result: 

Theorem 3.7. Consider a function jS : M"*" of the following form: for all u>0, 

/3(m)= £ {ai/m)Vi, (26) 

i:l<i<m,ai/m<u 

where the ViS are nonnegative with V\-\ h V;„ = 1. Consider the step-up procedure R^^f^m^ 

using t^'" defined by (23). Then for any Pe^, FDR{Rp^^p^yP) < amo{P)/m. 
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Note that the function /3 defined by (26) takes the value {a/m)v\ H h {ai/m)Vi in each 

u = aijm and is constant on each interval [ai j m^a(i -\- \)/m) and on (a,oo). Thus, it always 
satisfies that ^{u) < u, for any u > 0. This means that the procedure Rp(^fii.m-j rejects always 
less hypotheses than the linear step-up procedure 7?,™. Therefore, while Rp^fpsu-^ provides a FDR 
control under no assumption about the /j-value dependency structure, it is substantially more 
conservative than Rt^ under weak PRDS dependencies between the p- values. 

As an illustration, taking v,- = i~^5~^ for 5 = 1 + 1/2 + ... + 1/m, we obtain p{ai/m) = 
8~^ai/m, which corresponds to the hnear step-up procedure, except that the level a has been 
divided by 5 2± log(m). This is the so-called Benjamini-Yekutieli procedure proposed in [8]. The- 
orem 3.7 thus recovers Theorem 1.3 of [8]. We mention another example, maybe less classical, 
to illustrate the flexibility of the choice of j3 in Theorem 3.7. By taking v„,/2 = 1 ^rid V; = for 
/ / m/2 (assuming that m/2 is an integer), we obtain j5{ai/m) = {a/2) \{i > m/2}. In that case, 
the final procedure Rp(tPsu) rejects the hypotheses corresponding to p- values smaller than a/2 if 
2p(m/2) < and rejects no hypothesis otherwise. Theorem 3.7 ensures that this procedure also 
controls the FDR, under no assumption on the model dependency. Many other choices of j3 are 
given in Section 4.2.1 of [12]. 

Finally, let us underline that any FDR control valid under arbitrary dependency suffers from a 
lack of interpretabiUty for the underlying FDP, as discussed in Section 6.2. 

Remark 3.8 (Sharpness of the bound in Theorem 3.7). In Lemma 3.1 (ii) of [36] (see also 
[31]), a specifically crafted p-value distribution was built on [0, 1]'" (depending on jS) for which 
the FDR of Rp^^^^su-^ is equal to a (and mo{P) = m). If the underlying model ,0^ is such that 
{pi{X))i<i<m can have this very specific distribution for some P G the inequality "P G 
FDR(/?^(^(3™j,P) < a" in Theorem 3.7 is sharp. However, for a "realistic" model this p-value 
distribution is rarely attained because it assumes quite unrealistic dependencies between the p- 
values. Related to that, several simulation experiments showed that the standard LSU procedure 
still provides a good FDR control under "realistic" dependencies, see e.g. [21, 35]. This means 
that the corrections defined in this section are generally very conservative for real-Ufe data, be- 
cause their actually achieved FDR is much smaller than amo{P)/m. Finally, another drawback of 
the bound of Theorem 3.7 is that it is much smaller than a when 7ro(P) = m(){P)/m is small. To 
investigate this problem, we can think to apply techniques similar to those of Section 3.3. How- 
ever, the problem of adaptive FDR control is much more challenging under arbitrary dependency. 
The few results that are available in this framework are very conservative, see [13]. 

Remark 3.9 (Aggregation of dependent /^-values). Consider Theorem 3.7 in the particular case 
where all /^-values test the same null hypothesis, that is ©o,; = ©o for any /. According to Re- 
mark 1.6, we obtain a new test of level a, by rejecting Hq: "P G ©o" if the procedure R^(ffisu) 
defined in Theorem 3.7 rejects at least one null hypothesis, that is, if there exists A: > 1 such that 
P(k) < p{(xk/m). As an illustration, taking Vym = 1 and V,- = for i ^ ym, for a given y G [0, 1] 
such that ym G {1, we obtain j3(o;//m) = (ay) 1{/ > ym], which gives rise to a test re- 

jecting whenever ^ ^- This defines a new global p- value 

P = ™^{P{ym)y~^^) 

for testing i^o that can be seen as an aggregate of the original values. Thus, Theorem 3.7 shows 
that P(p' < a) < a under the null, for arbitrary dependencies between the original /j-values. 

Soumis au Journal de la Societe Frangaise de Statistique 

File: Roquain-jsfds-version2.tex, compiled with jsfds, version : 2009/12/09 
date: March 15, 2011 



20 



Etienne Roquain 



Interestingly, this aggregation procedure was independently discovered in [39] in a context where 
one aims at combining p- values that were obtained by different sphts of the original sample. Also 
note that y = \/m corresponds to the Bonferroni aggregation procedure. Let us finally discuss 
the choice y =1/1 (assuming that m/2 is an integer). In that case, the aggregated p-value is 
p = min(2 P{mj2)A)- According to Remark 3.8, the factor "2" in the latter is needed in theory 
but may be over-estimated for a "reaUstic" distribution of the p-value family. As a matter of fact, 
van de Wiel et al. (2009) have (theoretically) proved that this factor can be dropped as soon as 
the value family has some underlying multivariate Gaussian dependency structure, see [57]. 

4. Jt-FWER control 

The methodology presented in this section for controlling the ^-FWER under arbitrary dependen- 
cies can probably be attributed to many authors, e.g. [33, 63, 44, 45]. Here, we opted for a general 
presentation which emphasizes the rationale of the mathematical argument. This approach has 
been sketched in the talk [10] and investigated more deeply in [30] where it is referred to as the 
"sequential rejection principle". While the latter point of view allows to obtain elegant proofs, it 
is also useful for developing new FWER controlling procedures (e.g., hierarchical testing, Schaf- 
fer improvement), see [30, 29, 34]. This methodology has been initially developed for the FWER. 
We propose in Section 4.4 a new extension to the ^-FWER. 

In this section, for simplicity, we drop the explicit dependence of the multiple testing proce- 
dure R w.r.t. p in the notation. The parameter k is fixed in {1, ...,m}. 

4.1. Subset-indexed family 

As a starting point, we assume that there exists a subset-indexed family {R'^}<^^^ of multiple 
testing procedures satisfying the two following assumptions: 

• Reg is non-increasing, that is, 

, ^' c such that ^ c we have R^g^ C /?^^; (NI) 

• R<g controls the fc-FWER when ^ is equal to the subset of true null hypotheses, that is, 

VP G ^, fc-FWER(/?^„(p) , P) < a. (FWCo) 

A natural way of deriving such a family is to take a thresholding-based family of the form 

^^ = {l<j<m:p,<%}, (27) 

where % G [0, 1] is a threshold which possibly depends on the data p = {pi)\<i<m- Assumption 
(NI) then holds as soon as we take % non-increasing in (if C then < %). However, 
% should be carefully chosen in order to ensure (FWCo), as we discuss below. 

A first instance of a thresholding-based family satisfying (NI)-(FWCo) is the "Bonferroni 
family" that chooses % = min(aA:/|^|, 1). Condition (FWCo) results from Markov's inequality: 

n\'^oi.P) n/?^o(P)l > ^) < L < tMiP)) < \'^o{P)\tjf,iP)/k < a. 
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This family is not adaptive w.r.t. the dependence structure of the p- values. As an illustration, 
when the true values are all equal, say, to pt^, io G J^{P), we have 

p(|jro(p) n/?^„(P)| > ^) = F{\j%{p)\i{pi, < tj^^p)} >k)< tj^^p). 

Thus, under this extreme dependency structure, the Bonferroni threshold min(aA:/|'^|, 1) can be 
replaced by a (the only case which matters is > k, see Remark 4.2 below). Hence, there is a 
potential loss when using the Bonferroni family. In practice, the Bonferroni family is often used 
as a "benchmark family" for evaluating the performance of other families. 

In order to improve on the Bonferroni family, one can try to choose a threshold % that captures 
the dependencies between the values while still satisfying (NI)-(FWCo). For this, first note that 
for R'^ defined by (27), 

/:-FWER(/?^,P) = P(3/i, G ^W) : V/ G {ii, ...,ik},Pi < %) 
= F{k-mm{pi,i £ J^o{P)} < %^), 

where k-min{/),-,/ G J%{P)} denotes the ^-th smallest element of {pi,i G J^){P)}. Therefore, 
a natural choice for % is the a-quantile of the distribution of k-min{p,,/ G 'if}. However, the 
latter is generally unknown because the underlying distribution P is unknown. An idea is to 
approximate it by using a randomized thresholding procedure. This method can be applied when 
the null hypothesis is invariant under the action of a finite group of transformations of the original 
observation set ^ onto itself (such a transformation can be for instance a permutation or a sign- 
flipping, see [44, 45, 1, 2]). For a recent and general description of this method, we refer the 
reader to Theorem 2 of [30] (while [30] have developed this method only for ^ = 1, it can 
be directly generalized to the case of k> 1). The resulting family satisfies (N1)-(FWC()) while 
it is "adaptive" with respect to the p-value dependence structure, in the sense that = f'^(p) 
impUcitly takes into account the potential relations existing between the /j-values. 

Remark 4.1. The monotonicity condition introduced in [30] can be rewritten with our notation 
as follows: 

V<r X' C such that <^ C , we have n C . (wNI) 

Condition (wNI) is weaker than condition (NI). Thus, at first sight, the setting of [30] is more 

general than ours. The next reasoning shows that the two settings are in fact equivalent. Since 
the condition (FWCq) only depends on the set of R<ff n 'rf (for ^ = J^), we can add the el- 
ements of in the rejection set R'^ while still maintaining (FWCq) true. Therefore, starting 
from a subset-indexed family {R<^}'ffcJff satisfying the weaker assumptions (wNI)-(FWCo), we 
may define a new subset-indexed family {/?'^ j-^rc satisfying our assumptions (Nl)-(FWCo), 
by letting = R<ff U and then apply to this family the methodology described in the next 
sections. Moreover, by anticipating the definition of the FWER-controUing algorithm that will 
be presented in Section 4.4, we can easily check that the output of this algorithm applied to the 
family {/?'^}<g'c,if' is the same than the algorithm of [30] applied to the family {R<ff}<ff<zj)r- As a 
consequence, our framework covers the original setting of [30]. 

Remark 4.2. Any subset-indexed family {R's'}'S'gM' satisfying (Nl)-(FWCo) can be modified 
in the following way: take R<ff = (reject all hypotheses) when \^\ < k and R<^ = R<^ oth- 
erwise. This maintains the conditions (NI)-(FWCo), because the A:-FWER is always zero when 
\J%{P)\<k. 
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In what follows, we investigate the problem of the A:-FWER control once we have fixed a 
subset-indexed family {/?^}'rcJf satisfying (NI)-(FWCo). 

4.2. Single-step method 

From assumption (FWCq), the procedure Rjff^{p) using ^ = J^{P) controls the ^-FWER. Clearly, 
this procedure cannot be used because J^{P) depends on the unknown underlying distribution P 
of the data. We can use instead with = J^f because, from the two assumptions (NI)-(FWCo) 
above, we have k-VWER{R%,P) < /c-FWER(/?^(p),/') < a. This implies thatR^ always con- 
trols the fc-FWER at level a. The latter is generally called the single-step procedure (associated 
to the family {R<^}<^cjp)- However, we argue that Rj^ could be often too conservative w.r.t. 
Rmo(p)^ for the two following reasons: 

- -^Q^P) can be much smaller than J^; 

- the way the procedures {R^} have been built implicitly assumed that ^ = Mq{P) and can 
be very conservative when is much larger than Mq. 

For instance, these behaviors have been extensively discussed in [2] for particular Rademacher- 
resampled thresholding procedures. Therefore, we seek for a procedure controlling the ^-FWER 
which is "close" to Rj^^^p) and which can be derived from the family {R'^}<^^^ via a simple 
algorithm. 

4.3. Step-down method for FWER 

We present in this section the special case of A: = 1, following the approach of [44] with the 
presentation proposed in [10, 30]. Let us denote by the sets {R<ffY of non-rejected hypotheses 
for the subset-indexed family. Consider the event 

= {Rm(p) n -^oiP) = 0} = c 

By assumption (FWCq), we have > 1 — CU. Since from (NI), is non-decreasing in 

the following holds on Hq: for any ^ C , 

jTo (P) c ^ ^ A,^, (p) c A^ ^ JTo (P) C A^. (28) 

Thus, on the event Hq, taking ^ = ^o = Jif in (28) gives that Mq (P) C A^^, which in turn implies 
J%{P) C A<^^ by taking ^ = = A-g'^ in (28), and so on. By recursion, this proves the following 
result: 

Theorem 4.3. Assume that a family {/?^}^c^ of multiple testing procedures satisfies conditions 
(NI) and (FWCq) and consider the corresponding family of non-rejected hypotheses {A<^}<^qj^. 
Define by the following "step-down" recursion: 

• Initialization: 'rfo = 

• Step j > 1: let = A^.^ If^j = "^j-i, let = '^j and stop. Otherwise go to step 

Then the procedure R = (j^Y, which also equals R,^, controls the FWER at level a for any 

Pe^. 
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Note that for all j > 0, we have ^j+i C "ifj, because C "^o and A<^ is non-decreasing 
in ^. Thus, the set of rejected hypotheses can only increase during the step-down algorithm. In 
particular, the final procedure 'tf'^ = R,g is always less conservative than the single-step procedure 
R for the same FWER control. Thus, using a step-down algorithm is always more powerful 
than the single-step method. 

Example 4.4 (Bonferroni step-down procedure for FWER control). Theorem 4.3 can be used 
with the Bonferroni family = {1 < ? < m : p, < a/\'Ta\}. In that case, by reordering the 
p- values < ... < p{m) (with p(o) = 0), the corresponding step-down procedure defined in 
Theorem 4.3 can be reformulated as rejecting the nulls with pi < a/{m — 1+1), where i = 
ma\{i e {0, 1, ...,m} : < i, p^f:-^ < a/{m — i' + 1)}. This is the well known step-down Holm 
procedure which was introduced and proved to control the FWER in [33]. By contrast with step- 
up procedures, the step-down Holm procedure starts from the most significant value and stops 
the first time that a (ordered) ;7-value exceeds the critical curve. This is illustrated in Figure 3. 




Figure 3. Illustration of the two equivalent definitions of Holm's procedure. The left picture is the classical step- 
down representation: ordered p-values together with the solid curve a/{m — The filled points represent 
p-values that corresponds to the rejected hypotheses. The right picture illustrates the algorithm of Theorem 4.3: 
ordered p-values with the three thresholds a/10 (step 1), a/7 (step 2) and a/5 (step 3). For i e {1,2}, the points 
filled with "i" are rejected in the ith step of the algorithm. Both pictures use the same p-values and m = 10; a = 0.5. 



4.4. Step-down method for k-FWER 

We would hke to generalize Theorem 4.3 to the case of the A:-FWER. This time, we should 
consider the event 

Ho = {\Rje!>{p) n mP)\ <k-l} = {3/0 cM',\Io\=k-l : J^o{P) C A^„(p) U/q}, 
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which satisfies by assumption P(ilo) > 1 — a. For any subset C Jf , let 

0('^)= U A^u/= U ^^u/- (29) 
/cjr,|/|=jfe-l /c'^s|/|<fe-l 

Then we may prove that the following holds: on the event Hq, for any c J^, 

3/ C |/| = - 1 : ^(P) C ^U/ ^ 3/ C J^, |/| = A: - 1 : A^„(p) C A^g^i C (^»Cr) 

^ 3/'c=^,|/| =A:-1 : C(/>(^)U/'. 

The first impHcation holds because is non-decreasing in and the second imphcation holds 
by considering V = Iq. Thus, on the event Q.o, for any ^ C J^, 

|^^n=^(p)| <it-i ^ |(0('r))^nj^(p)| </:-i. 

This leads to the following result. 

Theorem 4.5. Assume that a family {R'^}'^cM' of multiple testing procedures satisfies conditions 
(Nl) and (FWCq) and consider the corresponding family of non-rejected hypotheses {A'^}'^Qjjf 
and let ^ be defined by (29). Define by the following "step-down" recursion: 

• Initialization: '^o = 

• Step j>l: let "^j = (j) {^j- \ ). If^j = let ^ = and stop. Otherwise go to step j + 1; 
Then the procedure R = i^Y, which also equals {(j){^)y = n|/|=*:-i ^^u/' controls the k-FWER 
at level a for any P ^ 3^. 

From (29), (^>(-) is non-decreasing, that is, V"^ C , <^>(^) < As a consequence, we 

derive from c "^o that C "^j-x for all j > 1. Therefore, the rejection set can only increase at 
each step of the step-down algorithm. In particular, the final procedure = r\\i\=k-iP^i\ji 
ways less conservative than the single step method R.yf, for the same ^-FWER control. Therefore, 
using the step-down algorithm always leads to a power improvement. 

To illustrate Theorem 4.5, let us consider a thresholding-based family of the form R<^ = {1 < 
i < m : Pi < t<ff} with a non-increasing threshold function i— > % (i.e., such that for C 'W, we 
have %/ < %) and such that {R'g}<^ satisfies (FWCq). The recursion relation = ^{^) can be 
rewritten in that case as follows: 

/C^M/|<*:-1 

= n {1 < i<m:p, <%u/} 
/c'rM/|<jfe-i 

= {\<i<m:pi< min {%u/}}- 

This recovers the generic step-down method described in Algorithm 2.1 of [45], which was 
developed in the case where the subset-indexed family is thresholding based. 

Example 4.6 (Bonferroni step-down procedure for A:-FWER control). When we choose the Bon- 
ferroni family, i.e., the threshold family t<^ = akl\'^\, we have 

min {t-^yji} = 



/c'rM/|<*-i m^{\'^\-\-k—\y 
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Therefore, in terms of the ordered p-values = /7(o) ^ P{i) ^ • • < P(m)' the procedure of The- 
orem 4.5 can be reformulated as rejecting the null Hqj when pi < ak/{mf\{m — l+k)) where 
l = max{l € {0, l,...,m} : V/ < p(^i>t) < ak / {m A {m - £' + k))} . The latter is the generalized 
Holm procedure, which was introduced and proved to control the A:-FWER in [36]. 



5. FDP control 



The problem of controlling the FDP has been investigated in many studies, e.g., [36, 59, 43, 
15, 45, 17, 46]. We follow here a methodology proposed by Romano and Wolf (2007), see [45]. 
They have proposed to use a family {5^}*; of A:-FWER controlling procedures and to choose k that 
ensures that the corresponding rejection number 15^1 is "sufficiently large". Roughly speaking, 
choosing k such that |5;fe| is larger than {k — l)/7imphes that, with high probability, 

FDP{Sk,P) = \SknjfQ{p)\/\Sk\ <{k- i)/\Sk\ < y. 

Obviously, as it is, the above reasoning is not rigorous, because the chosen k depends on the data. 
Theorem 4.1 (i) of [45] establishes that the latter approach leads to a correct FDP control in the 
asymptotic setting where the sample size available for each test tends to infinity. This can be seen 
as a Dirac configuration where each /(-value corresponding to false nulls are equal to zero. 

In this section, we propose to reformulate this approach by using as index the rejection number 
instead of k. Roughly speaking, if we choose {Rt\t such that each Ri controls the (y^-l- 1)-FWER 
and we choose i such that |/?^| > I, we obtain that, with high probability, 

FDP(/?^,P) = |/?^n jro(P)|/|/?^| < yil\Ri\ < y. 

Similarly to the previous paragraph, this argument is not rigorous because the chosen £ depends 
of the data. The main task of this section is to rationalize this approach. This leads to a general 
result (Theorem 5.2 given in Section 5.2), which covers both Theorem 4.1 (i) of [45] in the 
"Dirac" setting (see Section 5.4) and the earher result of [36] (see Section 5.3). As additional 
corollary, we derive the FDP control of the quantile-binomial procedure described in Algorithm 8, 
when the data are assumed to follow the model (see Section 5.3). 
In this section, the parameter 7 is fixed once and for all in (0, 1). 



5.1. Family indexed by rejection numbers 

Assume that we have at hand a family {Re}i<e<m of multiple testing procedures and a class of 
distributions C ^ satisfying the following properties: 

• Rg is non-decreasing with respect to £, that is, 

y£e{l,...,m-l}, RecRe+i\ (ND) 

• Ri controls the ( [y£\ + 1)-FWER at level a for any P e such that less than m — £ + 
ly{£— 1)J + 1 null hypotheses are true, that is, 

V£G{l,...,m}, VPG^'s.t.|^,(P)|<m-£+Lr(^-l)J+l, . 

P(|7?^nJ^(P)|> L7^J + l)<a ' ^ ^ 
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• for any P G , for any ^ G {1, ...,m}, the false rejection number of i?^ is independent of 
the correct rejection numbers of R^i, for \<.^'<.m, that is, 

VP G =^',V£g {!,..., m},|/?^ n^(P)i is independent of{|7?^/njri(P) 1,1 < f < m} . 

(DA) 

In condition (FWC), for any x>0, \_x\ denotes the largest integer n such that n<x. Condition 
(ND) is natural because the index i can be interpreted as a rejection number. It is easy to check 
in the examples below. 

For any ^' C condition (FWC) is fulfilled by the (single-step or step-down) /:-FWER 
controUing procedures of the previous section when k= [/^J + 1. As a first instance, we can use 
the (single-step) Bonferroni family using the threshold 0c([7£J -|- l)/m. Moreover, note that 
\J^{P) I <m — £+ [y{£ — 1)J -I- 1 in (FWC), thus we can consider the improved threshold 



^ m-£+lY{e-l)\ + l' ^ ^ 



The threshold (30) is slightly larger than the threshold used in Theorem 3.1 of [36] (they used 
[jil instead of [y{£ — 1)J in the denominator). As a second instance, we can substantially im- 
prove on the above threshold family when we additionally assume that the distribution P of the 
data Ues in the smaller subset = for this, note that for any P G and for any t G [0, 1], 
the variable |{/ G J^){P) : Pi{X) < t}\ is stochastically upper-bounded by a binomial distribu- 
tion of parameters \J%(P)\ and t, which in turn is stochastically upper-bounded by a binomial 
distribution of parameters m — £ + [^{1 — l)\ + I and t. Therefore, choosing the (deterministic) 
quantile-based threshold family {tf)i<i<m defined by 

/f = max{?G [0,1] :P(Z>7^) < a for Z ~ ^(m-£+ [/(£- 1)J + 1,/)} (31) 
= max{? G [0, 1] : q(:{t) < yi}, 

where q£{-) is defined by (8), we obtain a family of thresholding procedures satisfying (FWC) 
with = Clearly, since t^'^ in (30) is only based upon Markov's inequality, which is in gen- 
eral not accurate for binomial variables, the threshold family tf defined by (31) is substantially 
larger, as illustrated in Figure 4. Interestingly, we can use more elaborate deviation inequalities 
to obtain thresholds that are better than while having a form more exphcit than tf, see Re- 
mark 5.1. 

Assumption (DA) is a dependence assumption which is typically satisfied in the two following 
cases: 

— each procedure Rf uses a deterministic threshold and the /^-values associated to true nulls 
are independent of the values associated to false nulls, for all distributions of I^', that is, 

G {I, ...,m},Rf = {/ G { 1 , . . . , m} : pi <t(} for a deterministic tf G [0, 1] rr\A'\ 

and VP G i3^',(A(X));e^.(P) is independent of (p,(Z));e^,(P) ' 

— for all distributions of the number of correct rejections of each R^ is deterministic, that 
is, 

VP G , { \Re n JTi (P) 1 , 1 < £' < m} is deterministic . (DA") 
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Condition (DA") is satisfied for instance when M'\ iP) C for any which is the case 
for procedures of the form = {? G {1, :;?,•< ?^(p)} using a possibly data-dependent 
threshold f^(p) G [0, 1], when we assume that the ;7-values are in the Dirac configuration, that is, 
when they are equal to zero under the alternative. 

Remark 5.1. Using Hoeffding's and Bennett's inequalities (see, e.g., Proposition 2.7 and 2.8 in 
[38]), we can derive a family of thresholding procedures satisfying (FWC) with 3^' = , by 
using the threshold 

(?e)^ = max(f[^ff ,ff), (32) 

where we let 



Jio 



Lr^j + i 



m- 



tr = 



i)J + i 

h- 



log(l/o:) 



m-i+lr{i-\)\ + \ 



2{m-i+[Y{i- 
/log(l/a) 



1)J + 1) 



1/2 



vo 



with h{u) = u — log(M) — 1, M G (0, 1]. 




Figure 4. Threshold tf in (31) for model (solid line), threshold in (32) for model (dotted line) and 
threshold t^^ in (30) for model ^ (dashed line) infunction of I e {1, ...,m}. m = 100; 7= 0.2. Right: a = 0.5; left: 
a = 0.05. 



5.2. Step-down method 



The approach described in this section is an adaptation of the proof of Theorem 3.1 in [36] to 
our setting. Let us consider a family {Re}i<e<m and a class of distributions c ^ satisfying 
(ND)-(FWC)-(DA). We aim at selecting ^ = ^tiiat provides VP G FDP(/?|,P) < a. 
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First note that, by definition of the FDP, we have for any £ e {l,...,m} such that \Re\ = £: 

{FDFiRe,P) > 7} = {|=^o(^) n/?,| > yi} 

= {\M{P)nRe\>[Ye\ + \} 

= {ee^}, (33) 

where=Sf = 6 {l,...,m} : i-\J^i{P)nRi>\ > [yi] + 1} is a set which only depends on the set 
{\Jifi{P)nRe\,l<£' <m}. 

Second, note that for any £ e {l,...,m} such that > £, 

{£e^}c{\jro{P)nRi\>ly£\ + l}. (34) 

Let us consider £* = min{^} (with £* = m + I when ^ = 0). From (33) and (34), taking 
£ G {1, ...,m} such that \Rg\ = i and such that for any i < £, \Ri\ > £, we obtain 

{FDP(7?|,P) > 7} c {£* < !} 

c{|^o(/')n/?,.|> M + 

Moreover, if £*' > 2, by definition of £*, we have £* — I ^ ^ . Hence, we obtain the following 
upper-bound for \Mq{P)\\ 

=m-|^(P)| <m-|^(P)n/?£*_i| <m-r+[7(r-l)J +1. 

Since the above bound is also true when = 1, it holds for any possible value of £* . 

Finally noting that t only depends on the variable set {\J^i{P) nRgi\,l < £' < m} and using 
(FWC)-(DA), we have proved that for any £ G {1, ...,m}, 

P(FDP(/?^,P) >y\t = £)< ¥{\j%{P)nRe\ >[y£\ + l\£' = £) 

= n\J^o{P)nRe\>[y£\ + l) 
< a. 

Also, the probability P(FDP(/?^,P) > 7| ^* = m + 1) is zero, because it is smaller than F{£ e 
^\£* = m+l). This leads to the following result. 

Theorem 5.2. Assume that there exists a family {Rf^\<t<m of multiple testing procedures and a 
class of distributions 3^' C ^ satisfying the conditions (ND)-(FWC)-(DA) defined in Section 5.1. 
Consider the procedure Rf where 

£ = max{i€{0,...,m} :y£' e{0, ...,£}, \R(:>\>£'}, (35) 

(with the convention Rq = ^}. Then R^ controls the FDP in the following sense: 

VP G F{FDP{R^,P) >y)<a. (36) 

The algorithm performed to find (35) is a step-down algorithm; it starts from small rejection 
numbers and stops the first time that \Re\ is below £. Note that the maximum in (35) is well 
defined because £ = satisfies \Re\ > £. Furthermore, using (ND), relation (35) impUes £ < \R^\ < 
l^i+i I < ^ + 1, so that \Rf^\ = \Ri_^_i I = £ holds. As a consequence, the procedure of Theorem 5.2 
can be equivalently defined by R^ where 

£ = Tmn{£ G {l,...,m+ 1} : \Re\ <£-l}, (37) 

with the convention Rm+i = Rm (so that the minimum in (37) is well defined). 
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5.3. Theorem 3.1 of [36] and the quantile-binomial procedure as corollaries 

Going back to the specific setting (DA') described in Section 5.1, we may derive from Theo- 
rem 5.2 the following corollary. 

Corollary 5.3. Let us consider the deterministic threshold family {t£^)i<i<m defined by (30) and 
consider 

l=max{^G{0,...,m} :V/g{0,...,£},/7(^,) <t^'^}, (38) 

where = < p(i-j < ... < P(m) denote the ordered p-values and by convention t^f^ = 0. 
Then the procedure = {/ G {1, ...,m} : pi < t^^} satisfies the FDP control (36) for the sub- 
set of distributions P E ^ such that the family (Pi(^))!eJ^(P) independent of the family 

{pi{^))iejn{py 

By reproducing the end of the proof of Theorem 5.2 in the particular setting of Corollary 5.3, 
we may increase a bit the distribution set in Corollary 5.3 to the set of P £ ^ such that for 
any / G Ji^{P), Mu G [0, 1], P(p,(Z) < u \ {pi{^))ieMi(p)) < «• This is the distributional setting of 
Theorem 3.1 of [36]. Hence, we are able to recover the latter result (with a slight improvement 
in the threshold family). 

Furthermore, if we want to ensure the FDP control (36) only for the smaller distribution set 
= <^', we may consider the larger threshold family {tf)i<e<m defined by (31). This gives 
rise to the step-down procedure 

RQ = {ie{l,...,m}-Pi<tf}, (39) 

where £ = max{£ G {0, ...,m} : V^' G {0, P(£i) < t^} (with t^ = 0). The latter is the proce- 
dure described in Algorithm 1.7, because < tf if and only if q(i{p(^(^) < yl, with qi{-) defined 
by (8). As a consequence. Theorem 5.2 provides the result announced in Section 1.7. 

Corollary 5.4. For any 7, a G (0,1), the quantile-binomial procedure R^ described in Algo- 
rithm 1.7, or equivalently in (39), controls the FDP in the following way: 

VP G ¥{FDP{rQ,P) >/)<«. 

In particular, the median-binomial procedure R^ (using a = 1/2) provides that the median of 
the distribution of FDP[R^ ,P) is controlled at level J for any P G 

To our knowledge, the above result is a new finding. It establishes a FDP control which is 
substantially more suitable to the case of independent p-values in comparison with the procedure 
of [36]. Further comments on this procedure can be found in Section 6.3. 

5.4. Theorem 4.1 (i) of [45] as a corollary 

In Section 4 of [45], a step-down procedure 5^ is defined from a generic family {Sic}i<k<m of 
thresholding based procedures. The latter family is assumed to be such that each Sk controls the 
A:-FWER for I < A: < m and Sk C S^+i for I <k<m — I. The index k is obtained as follows: 

^ = min{A;G {I,...,m-|-I} : Y\Sk\ <k-Y}, (40) 
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where we use here the convention 5^+1 = Sm (so that the above set always contains k = m+\). 
Theorem 4. 1 (i) of [45] states that Sj^ controls the FDP in the asymptotic sense, as the sample 
size available to perform each test tends to infinity. This can be seen as a (non-asymptotic) FDP 
control in a Dirac configuration where the p-values corresponding to false nulls are equal to zero. 
Set under this form, Theorem 4.1 (i) of [45] can be derived from Theorem 5.2. 

Fortius, let/?^ = 5|^y^j+i, for £ G {1, ...,m}, and note that the family {R(}i<e<m satisfies (ND)- 
(FWC) and (DA"), by taking the distribution set corresponding to Dirac configurations for 
the values. Hence, Theorem 5.2 establishes the FDP control for the Dirac configurations of the 
procedure Ri where £ is defined by (35), or equivalently by (37). Thus, it only remains to show 
that the step-down algorithms (40) and (37) lead to the same procedure, that is. 

To prove the latter, we estabUsh k = \yl\ + 1. First, using (37), I satisfies 7|5'^^|^j_|_j| <yl — 
y. Since yl < \yl\ + 1 , we deduce from the definition of k that \_yl\ + \>k. Conversely, by 
considering the unique integer ^ G { 1 , . . . , m} satisfying k/y—l<l<k/y and thus also + ! = 
k, we have that for any integer j,yj <k^ j < i. Applying the latter for y = |5j| + 1, we obtain 
from 7(|5j| + 1) < ^ that \S^\ < £ — I and thus ^ > ^, by using the definition of £. This in turn 
implies ^ > [y^J + 1 . We thus have proved the following result, which can be seen as Theorem 4. 1 
(i) of [45] in the Dirac setting. 

Corollary 5.5. Assume that there exists a family {Sk}\<k<m of multiple testing procedures (with 
the convention Sm+i = Sm) satisfying 

- for each k G {l,...,m}, Sk is of the form {i G {l,...,m} : pi < tkig)} for a possibly data- 
dependent threshold tk{-) G [0, 1]; 

- for each k G {1, ...,m — 1}, Sk C 5^+1; 

- for each k G {1, ...,m}, VP G ^, k-FWER{Sk,P) < a. 

Consider k defined in (40) and the subset of distributions P & ^ corresponding to a Dirac 
configuration, i.e., such that VP G <^^', V/ G J^i{P), Pi(x) = Ofor P-almost every x G 3^. Then 
we have VP G ^' , W{FDP{SyP) >y)<a. 

6. Discussion 

6.1. Complexity of the k-FWER step-down approach 

One major hmitation of the A:-FWER approach presented in Section 4 is that the computation of 
(^(•) in (29) can become cumbersome when k is large because we should consider all subsets / 
of 'rf'^ of cardinality k — I (say that j'^'^ j > k — I). However, we may modify this algorithm by 
considering only the set / equals to the ^ — 1 indexes of corresponding to the /c — 1 largest 
/j-values in {/>,,? G "^^y. As noted in [45], this "streamlined" step-down procedure still controls 
the /:-FWER in the Dirac model where each false null has a p-valm equals to zero. The latter is 
true because in this model, as soon as \'^^'^ n J%{P)\ < k — 1, we know that the set n J%{P) 
is included in the set / of indexes corresponding to the fc — 1 largest p- values in {pi,i G '^'^} 
(because the /^-values of {pi,i G '^'^ fl J^i (P)} are zero). Nevertheless, no proof of this A:-FWER 
control stands without this Dirac assumption. 
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6.2. FDR control is not FDP control 

Since the only inteipretable variable is the FDP and not its expectation, controlling the FDR is 
meaningful only when the FDP concentrates well around the FDR. As the hypothesis number m 
grows, Neuvial (2008) showed that the latter holds for step-up type procedures when a Donsker 
type theorem for the e.c.d.f. is valid, so for instance under independence or "weak" dependence, 
see [41]. However, under some unspecified dependencies, we do not know how the FDP concen- 
trates. For instance, even under a very simple p-equi-correlated Gaussian model (corresponding 
to Example 1.2, where the non-diagonal entries of ^{P) are all equal to p), its was shown in [16] 
that the convergence rate of the FDP to the FDR can be arbitrarily slow when p = p„ tends to 
zero as m tends to infinity. Additionally, it was proved in [24] that no concentration phenomenon 
occurs when p is kept fixed with m. Also, as shown in [48], the "sparsity" (7ro(^) = ^o.m{P) 
tends to 1 as m tends to infinity) is one other feature that can slow down the FDP convergence. 
Therefore, in all these cases, the FDP convergence is slow and controlling the FDR does not lead 
to a clear interpretation for the underlying FDP. The latter drawback does not arise while control- 
ling the FDP upper-tail distribution: for instance, the FDP control P(FDP > 0.01) < 0.5 ensures 
that, with a probability at least 0.5, the FDP is below 0.01, and this interpretation holds whatever 
the FDP distribution is. However, the FDR stays useful, because this is a simpler criterion for 
which the controlling methodology is (for now) much more developed in comparison with the 
FDP controlling methodology. 

6.3. Quantile-binomial procedure and relation to previous work 

Let us consider the quantile-binomial procedure defined in algorithm 1.7 and the quantile func- 
tion q£{-) defined by (8). In the particular case where we take a = 1/2, the procedure is called 
the median-binomial procedure and Corollary 5.4 shows that it controls the median of the FDP 
at level y under independence of the p-values. Interestingly, in the "Gaussian regime" where 
the underlying binomial variable is close to a Gaussian variable (say, y not too small, many re- 
jections), the median is close to the expectation and thus qe{t) ~ {m — i -\- — 1)J + l)f ~ 
(m — (1 — Y)i+ l)t. Hence, in this case, the median-binomial procedure is close to the step-down 
procedure using the thresholding ti = yi/ (m — {\ — Y)i + I). As matter of fact, the latter proce- 
dure has been recently introduced by Gavrilov et al. (2009) and it has been proved to control 
the FDR under independence, see [26]. Roughly speaking, the latter may be interpreted in our 
framework as a "mean-binomial procedure". However, in the Poisson regime (say, j small, few 
rejections), the median-binomial procedure can be substantially different from the procedure of 
Gavrilov et al. (2009). Hence, we should keep in mind that the two procedures do not control the 
same error rate. These different remarks are illustrated in Figure 5, where we have also reported 
the Benjamini-Hochberg threshold. 

6.4. Conclusion 

In this paper, we have recovered some of the classical state-of-the-art multiple testing procedures 
for controlling the FDR, A:-FWER and the FDP. Additionally, some new contributions were also 
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Figure 5. Comparison between Benjamini-Hochberg thresholding t( = yl/m (dashed-dotted), the Gavrilov et al. 
thresholding ti = jljim— (1 — /)£+ 1) (dashed) andthe quantile thresholding tf defined by (31) with a = 0.5 (solid) 
injunction of I. m= 100; Top: 7 = 0.01; Bottom: 7 = 0.1. Each right picture is a zoom of the left picture into the 
region £ e {1,...,80} (top) or i€{\,...,5Q} (bottom). 



given for A:-FWER and FDP control, by extending and unifying some previous work of multi- 
ple testing literature and by finding a novel procedure, based on the quantiles of the binomial 
distribution, which controls the FDP under independence. 

The type I error rate control research area still has many unsolved issues. Among the major 
concerns, the FDP control in Section 5 needs a very strong distributional assumption on the test 
statistics, namely independence or "Dirac" assumption. To our knowledge, no procedure adaptive 
to dependencies is proved to control the FDP without assuming such a strong requirement. This 
is a room left for future developments, which would have a strong impact on high-dimensional 
data analysis. 
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Appendix A: Defining a p-\a\ue from a test statistic 

Let us consider the problem of testing a (single) hypothesis Hq : "P G ©o" from a test statistic 
S(X). Assume thatT^o should be rejected for "large" values ofS{X). We let Tp(s) = FxMS{X) > 
s), Fp{s) = Fx^p{S{X) < s) and Fp\v) = rma{s G RU {-00} : Fp{s) > v}. The following result 
is elementary and can be considered as well known. It is strongly related to Theorem 10.12 in 
[61], Lemma 3.3.1 in [37] (see also Problem 3.23 therein) and Proposition 1.2 in [17]. 

Proposition A.l. The p-value p{X) = suppg@^ Tp{S{X)) satisfies the following: 

(i) p{X) is stochastically lower-bounded by a uniform variable under the null, that is, 

VP G 00, Vm e [0, 1], ¥x^p{p{X) <u)<u. 

(ii) if for any P G 0o, Fp is continuous, we have for any realization x ofX, 

p{x) = min{a G [0, 1] : S{x) > sup Fp\l - a)}. 

Pe&o 

If additionally ©o is a singleton, p{X) ^U{0,1) whenever P G ©q. 
(Hi) if for any P G ©0, the variable S{X) takes its values in a discrete set with probability 1, 
we have for any realization x ofX, 

p{x) = mm{a G [0, 1] : S(x) > sup Fp^{\ - a)}. 

Pe@o 

In particular, ifS{X) is an integer random variable, we have for any x such that S{x) G N, 
p{x) = mm{a G [0, 1] : S{x) > sup Fp\l -a) + 1}. 

P60O 

A consequence is that the two classical definitions of a p-value are compatible in the following 

way. 

Corollary A.l, Assume that there exists such that for any P G ©o, for all 5' G M, Fp{s) > 

Fq{s). Let p{X) = Tq{S{X)) and consider the families of tests {<i>a}ae[o^\ ^""^ {0a}ae[O,i]' where 
(j,^(x) = l{S{x) > Fq\1 - a)} and <^>^(x) = l{S{x) > Fq\1 - a)}. Then the following holds. 

(i) if Fq is continuous, the tests (j)ci and (^^ are of level a for all a G [0, 1] and we have for 
any realization x ofX, 

\p{x),\] = {ae[Q,\]:^a{x) = l}. 

and for Q-almost every x, 

(p(x),l]={aG[0,l]:<^>;(x) = l}. 

(ii) if for X ~ Q the variable S{X) takes its values in a discrete set with probability 1, the test 
0^ is of level a while the test (j>a is not of level a, for all a G [0, 1], and we have for any 
realization x ofX, 

[p(x),l]={aG[O,l]:0;(x) = l}. 
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In particular, we have both in the continuous and discrete case that for Q-almost every x, 

p{x)=M{a£[0,l]:<^'^{x) = l}. 

Proof. From Proposition A.l (ii) and (iii), the only assertion to be proved is that for all a G 
[0,1], for g-almost every x, S{x) > Fq^{\ - a) p{x) < a. Let us denote =2 = {Fq^{1 - 
a),a & [0, 1]}. Since Fq is increasing on the desired relation is provided for S{x) G =S. We 
can conclude because ¥xr^Q{S{X) G =S) = 1. □ 

Example A.3. To illustrate (i) and (iii) of Proposition A.l, let us consider the following simple 

discrete testing setting (coming from Example 3.3.2 in [37]). Let Hq : "P = Pq" where Pq is 
the uniform distribution on {1,...,10} and consider the test statistic S{X) =X. We easily see 
that the p-value Tp^{X) is p{X) = (11 -Z)/10. It satisfies F{p{X) <u)<u, with equality iff 
u can be written under the form j/10 for some integer i, I < i < 10. Furthermore, rejecting 
Hq for p{X) < a is equivalent to reject Hq whenever X > k{a) where k{a) is the unique integer 
satisfying (1 1 - k{a))/\0 < a < (12 - k{a))/lO. We merely check that k{a) = (1 - a) + 1. 

Finally, we provide a proof for Proposition A.l. 

Proof. Let Fp{s) = Px^p(iS'(X) < s) and let us first state the following result: for any P, for any 
ae[0,l], 

m(?(X))<«l-/ {S{X)>F,\\-a)} iiFp{Fp\\-a)) = \-a 
^^''^^^^>'-''^-\{S{X)>Fp\l-a)} otherwise " ^^^^ 

To establish (41), first note that {Tp{S{X)) <a} = {Fp{S{X)) > 1 - a} c {S{X) > Fp\\ - a)}, 
by definition of Fp^{\ - a). On the one hand, if Fp{Ff\\ - a)) = 1 — a, we have {S{X) > 
Fp\\ - a)} C {Fp{S{X)) > Fp{Fp\\ - a))} = {Fp{S{X)) > 1 - a}. On the other hand, if 
Fp{Fp\l -«))<!-«, we have {Fp{S{X)) > 1 - a} c {S{X) > Fp\l - a)} and {S{X) > 
Fp-i(l-a)} c {Fp{S{X))>Fp{Fp\l-a))} C {Fp(S{X)) > l-a}.This proves (41). 

Let us now prove (i). We have for any P £ &o, Fx^pip{X) < a) < Px^p(7>(5(Z)) < a). 
Next, applying (41), we have if Fp{Fp^{l — a)) = I — a, 

PxMpiX) <oc)< VxMSiX) > Fp\l -a)) = l -Fp{Ff\l -a)) = a 

aad if Fp{Fp\\- a)) < 1 - a, 

^x^p{p{X) <a)< FxMSiX) > Ff\l - a)) = l-Fp{Fp\l - a)) < a. 

Assume now that for any P G &o, Fp is continuous, and prove (ii). In this case, Fp{Ff^{l — 
a)) = Fp{Ff\l - a)) = 1 - a for any a G [0, 1], so that (41) provides that {Tp{S{X)) < a} = 
{S{X) > F-\l - a)}. Hence, we obtain for any reaUzation x of X, 

p{x) = min{a G [0, 1] : VP G ©o, Tp{S{x)) < a} 

= min{a G [0, 1] : VP G &o,S{x) > Fp\l - a)}, 

which leads to the desired result. 

For (iii), the proof is similar by noting that Fp{Ff^(\ — a)) < 1 — (Z in the case where the 
distribution of S{X) has a discrete support under the null. □ 
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